Parsing table

This page is about LR parsing tables, there are also LL parsing tables which are substantially different.

In computer science, a parsing table is the part of a parser that makes decisions about how to treat input tokens in compiler development.

Contents

Overview

A parsing table is a table describing what action its parser should take when a given input comes while it is in a given state. It is a tabular representation of a pushdown automaton that is generated from the context-free grammar of the language to be parsed. This is possible because the set of viable prefixes (that is, the sequence of input that can be set aside before a parser needs to either perform a reduction or return an error) of a context-free language is a regular language, so the parser can use a simple parse state table to recognize each viable prefix and determine what needs to be done next.

A parsing table is used with a stack and an input buffer. The stack starts out empty, and symbols are shifted onto it one by one from the input buffer with associated states; in each step, the parsing table is consulted to decide what action to take.

More simply, an Action table defines what to do when a terminal symbol is encountered, and a goto table defines what to do when a nonterminal is encountered.

The parsing table consists of two parts, the action table and the goto table. The action table takes the state at the top of the stack and the next symbol in the input buffer (called the "lookahead" symbol) and returns the action to take, and the next state to push onto the stack. The goto table returns the next state for the parser to enter when a reduction exposes a new state on the top of the stack. The goto table is needed to convert the operation of the deterministic finite automaton of the Action table to a pushdown automaton.

For example, a parsing table might take a current position in state 1 and an input of "+" from the input buffer, and return instructions to shift the current symbol onto the stack and move to state 4. More precisely, the parser would push the token "+" onto the stack, followed by the symbol 4. Or, for an example of the use of a goto table, the current stack might contain "E+E", which is supposed to reduce to "T". After that reduction, looking up "T" in the goto table row corresponding to the state currently at the top of the stack (that is, the state that was under the "E+E" popped off to do the reduction) would tell the parser to push state 2 on top of "T". Action may be a Reduction, a Shift, or Accept or None.

Reduction

Happens when the parser recognizes a sequence of symbols to represent a single nonterminal, and substitutes that nonterminal for the sequence. In an LR parser, this means that the parser will pop 2*N entries from the stack (N being the length of the sequence recognized - this number is doubled because with each token pushed to the stack, a state is pushed on top, and removing the token requires that the parser also remove the state) and push the recognized nonterminal in its place. This nonterminal will not have an associated state, so to determine the next parse state, the parser will use the goto table.
This explanation can be made more precise using the "E+E" example above. Suppose the input buffer is currently empty and the contents of the stack are, from bottom to top:
0 E 1 + 5 E 1
The parser sees that there are no more tokens in the input stream and looks up the empty token in the parsing table. The table contains an instruction, for example, "r4" - this means to use reduction 4 to reduce the contents of the stack. Reduction 4, part of the grammar used to specify the parser's context-free language, should be something along the lines of:
T -> E+E
If the parser were to reduce the "E+E" to a "T", it would pop six symbols from the stack, leaving the stack containing:
0
and then push the new "T", followed by the state number found in the goto table at row 0, column T.

Shift

In this case, the parser makes the choice to shift the next symbol onto the stack. The token is moved from the input buffer to the stack along with an associated state, also specified in the action table entry, which will be the next parser state. The parser then advances to the next token.

Accept

When a parse is complete and the sequence of tokens in question can be matched to an accept state in the parse table, the parser terminates, and may report success in parsing the input. The input in question is valid for the selected grammar.

See also

References